-
Notifications
You must be signed in to change notification settings - Fork 72
Hostexec: Enable building with cmake4 #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-mainline
Are you sure you want to change the base?
Conversation
The underlying issue was that I forgot to clean the cache directory before running the test. So the test ended up running sometimes on a dirty cache yielding bad fails. Since the code is only running a single comgr action that only converts spirv->bc, the contents of the cache should be 2 files: * the bitcode * the cache timestamp
In this patch, we add a new action: AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE That accepts a set of .spv files, translates them to .bc files, extracts any embedded @llvm.cmdline flags, and then compiles to a set of relocatable .o files.
The underlying issue was that I forgot to clean the cache directory before running the test. So the test ended up running sometimes on a dirty cache yielding bad fails. Since the code is only running a single comgr action that only converts spirv->bc, the contents of the cache should be 2 files: * the bitcode * the cache timestamp
Note that this is not an NFC change because the test case `llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll` has been updated due to the recent SGPR layout change. The 32 CSR SGPRs in `callee_need_to_spill_fp_exec_to_memory` have been adjusted to reflect this update. Change-Id: I332a721e7e8feaa5491c63228ecb42759e4d979d
This PR updates the SGPR layout to a striped caller/callee-saved design, similar to the VGPR layout. To ensure that s30-s31 (return address), s32 (stack pointer), s33 (frame pointer), and s34 (base pointer) remain callee-saved, the striped layout starts from s40, with a stripe width of 8. The last stripe is 10 wide instead of 8 to avoid ending with a 2-wide stripe. Fixes llvm#113782. Change-Id: I6fe8fca8b70985a8775ec04d93b460333533d2bb
…#128170) Fixes SWDEV-515029
For hipBinNVPtr_ and hipBinAMDPtr_ members: the destructor of the base class was not marked as virtual, but the destructor of the derived classes are. When we delete the object we do it through a pointer to the base class. So the base class destructor is called but not the one of the derived classes. This results in strange memory behaviour detected by ASAN. Solves SWDEV-516418
Also archive the Comgr V3 Release notes, and start a new document for Comgr V4 changes. Change-Id: I25137c174bd70caafe9b3c26d3a956331e0e9dfc
…126058) (llvm#3162) GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project. Cherry-picked from bcba311 Fixes "Arg && "Attempting to load missing argument" assert in Numba from SWDEV-543227 Co-authored-by: Robert Imschweiler <[email protected]>
HIP runtime support for compressed bundle format v3 is in place, therefore switch the default compressed bundle format to v3 in compiler. This allows both compressed and decompressed fat binary size to exceed 4GB by default. Environment variable COMPRESSED_BUNDLE_FORMAT_VERSION=2 can be used for backward compatibility for older HIP runtimes not supporting v3. Fixes: SWDEV-548879
…t_fail() (llvm#144886) (llvm#3189) Modifications to reapply the commit: * Add noexcept only after C++11 on __glibcxx_assert_fail * Remove vararg version of __glibcxx_assert_fail And doc CP. Issue [SWDEV-518041](https://ontrack-internal.amd.com/browse/SWDEV-518041) & doc task [SWDEV-538485](https://ontrack-internal.amd.com/browse/SWDEV-538485) --------- Co-authored-by: Juan Manuel Martinez Caamaño <[email protected]>
llvm#3457)…llvm#129037) When a read(first)lane is used on a binary operator and the intrinsic is the only user of the operator, we can move the read(first)lane into the operand if the other operand is uniform. Unfortunately IC doesn't let us access UniformityAnalysis and thus we can't truly check uniformity, we have to do with a basic uniformity check which only allows constants or trivially uniform intrinsics calls. We can also do the same for unary and cast operators. Co-authored-by: Pierre van Houtryve <[email protected]>
…#3749) The workaround will be active only if the system doesn't have pcie atomics Co-authored-by: Andryeyev, German <[email protected]>
…tributor run (llvm#155246) (llvm#3772) We do not need this in the attributor, because `ST.getWavesPerEU` accounts for both the waves-per-eu and flat-workgroup-size attributes. If the waves-per-eu values are not valid, it drops them. In the attributor, we only need to propagate the values without using intermediate flat workgroup size values. Fixes SWDEV-550257. (cherry picked from commit ca03045)
…d integers. (llvm#3581) This patch extends the instruction combiner to simplify the construction of a packed scalar integer from a vector type, such as: ```llvm target datalayout = "e" define i32 @src(<4 x i8> %v) { %v.0 = extractelement <4 x i8> %v, i32 0 %z.0 = zext i8 %v.0 to i32 %v.1 = extractelement <4 x i8> %v, i32 1 %z.1 = zext i8 %v.1 to i32 %s.1 = shl i32 %z.1, 8 %x.1 = or i32 %z.0, %s.1 %v.2 = extractelement <4 x i8> %v, i32 2 %z.2 = zext i8 %v.2 to i32 %s.2 = shl i32 %z.2, 16 %x.2 = or i32 %x.1, %s.2 %v.3 = extractelement <4 x i8> %v, i32 3 %z.3 = zext i8 %v.3 to i32 %s.3 = shl i32 %z.3, 24 %x.3 = or i32 %x.2, %s.3 ret i32 %x.3 } ; =============== define i32 @tgt(<4 x i8> %v) { %x.3 = bitcast <4 x i8> %v to i32 ret i32 %x.3 } ``` Alive2 proofs (little-endian): [YKdMeg](https://alive2.llvm.org/ce/z/YKdMeg) Alive2 proofs (big-endian): [vU6iKc](https://alive2.llvm.org/ce/z/vU6iKc)
Co-authored-by: Amit Kumar Pandey <[email protected]> Co-authored-by: Hans Wennborg <[email protected]> Co-authored-by: Amit Pandey <[email protected]>
llvm#3870) …(llvm#3208) 'hsa_vmem_address_free'. Implement interception of 'hsa_amd_vmem_address_reserve_align' and 'hsa_vmem_address_free' so as to support ASan overflow errors for memory allocated via 'hipMallocManaged'. [Ticket: SWDEV-483895] --------- Co-authored-by: Amit Pandey <[email protected]>
Due to a botched merge, we currently emit volatile loads from feature predicate globals. These are never foldable, which breaks things. This does not apply to the upstream patch currently under review. Commiting on behalf of github user @AlexVlx
llvm#3577) ...(llvm#131167) Fixes SWDEV-514946 Co-authored-by: Emma Pilkington <[email protected]>
…lvm#3748) This along with IntrReadMem means that the Intrinsic only reads memory through the given argument ptr and its derivatives. This allows passes like Inliner to attach alias.scope to the call instruction as it sees that no other memory is accessed. Discovered via SWDEV-543741 --------- Co-authored-by: Matt Arsenault <[email protected]> Cherry-picked from 1d30f71 --------- Co-authored-by: choikwa <[email protected]>
…lvm#4011) Restrict to VGPR only (VRegSrc_32) for mfma scale operands to workaround a hardware design defect: For all Inline/SGPR constants, SP HW use bits [30:23] as the scale. TODO: We may still be able to allow Inline Constants/SGPR, with a proper shift, to obtain a potentially better performance. Fixes: SWDEV-548629
Co-authored-by: Thao, Vang <[email protected]>
Add reference to ROCm compiler reference, remove unused test file update link in ENV topic
Bump the minimum required cmake version from 3.0 to 3.20.0 to enable building with cmake4. This is the same minimum required version as the parent directory "offload" uses.
I tested with the amd-llvm version and without this patch the build with cmake 4.1.0 would produce a following error:
When this patch is applied, llvm-build worked both with the cmake 3.28.3 and with the cmake 4.1.0.
After that I tested the the llvm-version build both with the cmake 3.28.3 and 4.1.0 to build the rest of the rocm-stack and pytorch and kernel loading to AMD gpu's worked ok with my pytorch and triton test spps. cmake version does not probably affect to lit-test, so they are relevant for this. They passed anyway for command:
Not sure how to do more testing for this one. |
There is no need for further testing on our side. The LLVM team has picked this up and are on it but this will go to an internal repo first. |
3b36fc8
to
ffd4bcc
Compare
Motivation
Commit ROCm/TheRock@267c4d9 enabled to build
hostexec
but it does not build with cmake 4. However, TheRock promises since ROCm/TheRock#1440 to build with cmake 4 and as such this needs to be fixed.Technical Details
Bump the minimum required cmake version for hostexec from 3.0 to 3.20.0 to enable building with cmake4. This is the same minimum required version as the parent directory "offload" uses.